-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Integrate Panorama into IndexHNSWFlatPanorama
#4621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mdouze
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR.
About Panorama in general: would it be feasible to make an IndexRefine that supports FlatPanorma as a refinement index?
The reason is because it may be more efficient to do all the non-exhaustive searches in low dimension and refine the result list in the end.
This would also make it possible to apply panorama to low-accuracy & fast indexes like FastScan and RabitQ indexes.
|
Please share any performance comparison you have with this code vs. the HNSWFlat implementation. |
|
@mdouze thanks for the review
|
|
@AlSchlo is it worth allowing configuring a default |
|
@alexanderguzhva excellent suggestion! so we actually used to have an epsilon knob there, but we ended up not talking about it in the paper. It's a knob that just adds confusion IMO and makes the workload more unpredictable. We did not study it in more detail as the paper was getting too dense. |
|
Sorry for the delay on these PRs, I'm still conducting some benchmarking. |
|
@mnorris11 No worries, and thank you so much for the reviews! As an update, after #4645 is confirmed, I have a local build of
|
|
The benchmarks look good on my end! Regarding the |
|
Hi @mnorris11 , To clarify: are you suggesting to make |
@AlSchlo I mean, is it possible to just have a How about something like this patch file below? Feel free to note any cons or pitfalls that I'm missing. cc @mdouze if you have other opinions. IMO it is only a small style thing. After we decide one way or another, this one should be good to merge? |
|
@mnorris11 That's a good suggestion, makes it a bit cleaner and less invasive. |
faiss/impl/HNSW.cpp
Outdated
| idx_t idx = index_array[i]; | ||
| if (!sel || sel->is_member(idx)) { | ||
| if (res.add_result(exact_distances[i], idx)) { | ||
| threshold = res.threshold; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like threshold is not used after being written to here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I did this to better mirror the original code. Please check my latest commit, where I made it a bit more idiomatic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(just realized this has a little bug which I will fix, condition seems wrong)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed and wrote a test for it! Please check out :) @mnorris11
|
Sorry to mention it after so long: I realized the test is cpp, not Python. Usually we prefer Python tests if possible, to exercise the SWIG layer. Can this be converted? Otherwise, tests are all passing internally and benchmarks look good. I will add Panorama for various indexes to the "Guidesline to choose an index" and "Index Factory" wikis after everything is merged. |
|
Ah, I matched |
|
Done. I translated it with Claude and then reviewed the output manually. It seems quite accurate to me. @mnorris11 |
|
@mnorris11 merged this pull request in 9080fdb. |
|
This pull request has been reverted by c36cd39. |
Summary: This PR introduces **Panorama** into `HNSWFlat`, following our [paper](https://arxiv.org/pdf/2510.00566). Panorama achieves up to **4× lower latency** on higher-dimensional data, making it a great option for medium-sized datasets that don't benefit much from quantization. Below are some benchmarks on **SIFT-128**, **GIST-960**, and **synthetic 2048-dimensional** data. I recommend checking out the paper for more results. As expected, Panorama is not a silver bullet when combined with HNSW—it’s only worthwhile for high-dimensional data. It might be worth considering, in the future, adding a function that dynamically sets the number of levels. However, this would require reorganizing the cumulative sums. ## SIFT-128 **Note:** SIFT-128 performs slightly worse here than in our paper because we use 8 levels, whereas the paper explored several level configurations. Eight levels introduce quite a bit of overhead for 128-dimensional data, but I kept it consistent across all benchmarks for comparison. <img width="788" height="435" alt="bench_hnsw_flat_panorama_SIFT1M" src="https://github.com/user-attachments/assets/5004dee4-de0a-44e5-9031-582f7738a348" /> ## GIST-960 <img width="787" height="435" alt="bench_hnsw_flat_panorama_GIST1M" src="https://github.com/user-attachments/assets/ba73d062-be14-44b0-9c46-ac1dafbfaed2" /> ## Synthetic-2048 <img width="794" height="435" alt="bench_hnsw_flat_panorama_Synthetic2048D" src="https://github.com/user-attachments/assets/acf6233c-185b-4295-9c63-7a4b5b037619" /> Pull Request resolved: facebookresearch#4621 Reviewed By: mdouze Differential Revision: D85902427 Pulled By: mnorris11 fbshipit-source-id: 4db9e950ce0c532494fa99ae93d39ccf06779b5d


This PR introduces Panorama into
HNSWFlat, following our paper. Panorama achieves up to 4× lower latency on higher-dimensional data, making it a great option for medium-sized datasets that don't benefit much from quantization.Below are some benchmarks on SIFT-128, GIST-960, and synthetic 2048-dimensional data. I recommend checking out the paper for more results. As expected, Panorama is not a silver bullet when combined with HNSW—it’s only worthwhile for high-dimensional data.
It might be worth considering, in the future, adding a function that dynamically sets the number of levels. However, this would require reorganizing the cumulative sums.
SIFT-128
Note: SIFT-128 performs slightly worse here than in our paper because we use 8 levels, whereas the paper explored several level configurations. Eight levels introduce quite a bit of overhead for 128-dimensional data, but I kept it consistent across all benchmarks for comparison.
GIST-960
Synthetic-2048